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We empirically investigate distributions of individual consumption expenditure for four 
commodity categories conditional on fixed income levels. The data stems from the Family 
Expenditure Survey carried out annually in the United Kingdom. We use graphical tech- 
niques to test for normality and lognormality of these distributions. While mainstream 
economic theory does not predict any structure for these distributions, we find that in 
at least three commodity categories individual consumption expenditure conditional on a 
fixed income level is lognormally distributed. 
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I. Introduction 

Probabilistic concepts have been fundamental to economic theories of finan- 
cial markets for almost a century and these markets have received much 
attention from the econophysics community in recent years 0. With a few 
exceptions (see |^), this has not been the case for other markets, e.g. com- 
modityQ markets although they might deserve broader attention from econo- 
physicists and be a fruitful application area for methods of computational 
statistical physics as well. 

The prevailing economic framework for describing markets for commodi- 
ties is General Equilibrium Theory With its main results established 
about fifty years ago, it continues to be a fundamental paradigm in economic 
thought. Its starting point is a set A of agents and a space of individual 
consumption plans R^. Each agent a E A chooses a vector q G (with 
coordinate denoting the quantity of commodity i she wants to consume) 
as the maximal element with respect to an order relation over R^, called her 
preference or taste, subject to the restriction that it must be affordable to 
her at the prevailing price system p G R^ (with pi denoting the price for 
commodity i). Choices of firms regarding supply of commodities are mod- 
elled in a similar manner. The main success of General Equilibrium Theory 
has been to prove the existence of a price system p* equilibrating aggregate 
demand and supply on the market for each commodity given any specifica- 

"'^FoUowing economic terminology, we use the word commodity for goods and services. 
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tion of preferences and income on the set A under very mild assumptions 
on the set V of admissible individual preferences. Thus it has established 
a rigorous framework for understanding how a decentralized economic sys- 
tem where individuals and firms decide in a seemingly uncoordinated fashion 
about their individual demands and supplies can become self-coordinated by 
the price system. However, General Equilibrium Theory is an intrinsically 
static concept in many respects and therefore not capable of explaining some 
important issues. One major point is that it does not derive endogenously 
the shape and the dynamics of the distribution of agents' characteristics, like 
preference, expectations and income. Because the space of admissible distri- 
butions of characteristics is not restricted in the model. General Equilibrium 
Theory has too little structure to produce empirically testable predictions 
on market outcomes 0. Already in 1974, a Markov Random Field model 
with a finite subset of V as state space has been presented |0 from which, 
leaving aside some technical difficulties, distributions of preferences can be 
derived. Unfortunately, this approach received little attention in the main- 
stream economics community despite the contention from social sciences that 
interaction between consumers is likely to be an important factor determining 
consumption decisions. As a result, to our knowledge no attempts have been 
made to derive empirically testable predictions from probabilistic models of 
preference dynamics. In this paper, we document empirical regularities in 
the distributions of individual cross-section consumption expenditure which 
might suggest that heterogenity of individual consumption expenditure for 
certain groups of commodities is indeed governed by a common stochastic 
mechanism. 

II. Data and methodology 

Our expenditure data is provided by the Family Expenditure Survey carried 
out annually since 1957 in the United Kingdom. The survey is based on a 
representative sample of about 7000 households which amounts to 0.05% of 
all households in the United Kindom. A household comprises one person 
living alone or a group of people living at the same address. Each household 
contributes information about its total income and its total expenditure for 

^The problem lies in the fact, that a distribution of preferences cannot be observed 
empirically because to do so we would need to know consumers' consumption decisions 
for a large set of price systems p G i?:^ (While relative prices do change over time, so 
do preferences; as a result, a ceteris paribus condition cannot be secured). On the other 
hand, the distribution of income is observable and substantial progress has been achieved 
in General Equilibrium Theory by taking it into account However, this approach relies 
on ad-hoc assumptions on the distribution of preferences the validity of which cannot be 
tested within the prevailing theoretical framework. 
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goods and services in a time period of two weeks. Related types of goods 
and services are grouped into nine categories and expenditures are aggregated 
within each category. Information about expenditures is obtained partly by 
records kept by individual members, partly by interview in case of periodic 
expenditures. Details of income are obtained by interview. The periods for 
the record book and interviews are spread evenly over the year. Addition- 
ally, household characteristics like the number of household members, the 
age of the household head etc. are recorded. We confine our analysis to 
the categories Services, Fuel (comprising fuel, light and power). Food (com- 
prising food and nonalcoholic beverages) and Travel (comprising transport 
and vehicles). For each of these categories, we investigate the distribution 
of expenditure within an annual sample. It is obvious that income is an im- 
portant determinant of expenditure. However, we want to exclude the effect 
of income heterogenity Q to focus solely on heterogenity of tastes. There- 
fore we aim at estimating the distribution of consumption expenditure for 
a fixed value of income rather than in the whole sample. Clearly, we have 
to base our estimation on subsamples consisting of observations from narrow 
income intervals. Narrowing down these intervals is limited by the need to 
have a sufficient number of points in a subsample. Furthermore we include 
in one subsample only observations from households with a common num- 
ber of persons, because consumption patterns presumably vary with the size 
of households. We choose from each annual sample four subsamples with a 
width of about 0.3% of the total income spectrum. Within each interval we 
estimated the income distribution using nonparametric techniques. Income 
tends to be spread evenly within these intervals with no notable regularities. 
To eliminate the effects of remaining income variance we corrected individual 
observations based on the slope of the Engel curve which regresses the de- 
pendence of consumption on income. With this slope being in most cases in 
the range between -0.1 and 0.3, we found that this procedure has negligible 
effect on the results except for some smoothing. Income and consumption 
expenditure do not correlate in any of the corrected subsample datasets. In 
summary, the stratification procedure resulted in 37 subsamples comprising 
22 subsamples with one-person households and 15 with two-person house- 
holds with the number of observations in each subsample between 300 and 
700. In a first step, we used nonparametric density estimation techniques to 
get qualitative information on the shape of the density functions. In a second 
step, we used probability plotting ^ to investigate the functional type of the 

■^The shape and the origin of the income distribution constitute an extremely interesting 
research topic, but it is important to stress that regularities present in the distribution of 
income are not related to the regularities we aim at in this paper. 
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distributions. In probability plotting, the values of the empirical distribution 
function are transformed in such a way that they will follow a straight line 
if plotted against the observed realizations of the random variable x (within 
sampling error) if the hypothesized distribution is the true underlying distri- 
bution. Assume the true distribution is F with mean n and variance a^. We 
write 

F{x) = gC-^) = G{z) (1) 
a 

If we plot z = G^^(F{x)) = against x, the resulting plot will be a 
straight line. Probability plotting displays Zi = G^^{Fn{x(i))) on x^i) with 
the empirical distribution function 

^ / N i — 0.5 , , 

F„(x(.)) = (2) 

and the ordered observations a;(i) < . . . < X(^n) . If the hypothesized distribu- 
tion is normal, P] recommends that F„(x) be transformed by 

z = sign(F„(x) - 0, 5)(1, 238t(l + 0, 0262t)) (3) 

with 

t = {-ln[4F„(x)(l-F„(x))]}i/2 

and plotted against x. If the hypothesized distribution is lognormal, [§] rec- 
ommends that the same transformation be applied on F„(x) and z to be 
plotted against Inx. 

III. Results 

Nonparametric estimates show that for each category and in all subsamples 
the distribution of consumption expenditure is unimodal. The estimated 
density function oscillates in the tails due to limited sample size. The mag- 
nitude of these oscillations is similar as in nonparametric density estimates 
of Monte-Carlo generated samples from lognormal distributions presented 
in the literature 0. For the good categories Services, Fuel and Travel, the 
nonparametric density estimates indicate that the distributions of expen- 
diture are skewed to the right (see top of Figures 1, 2, 3 for representative 
plots). In the category Food, the estimated distributions appear to be slightly 
skewed to the right. With these preliminary findings, we tested the data of 
each subsample and for each category for normality and lognormality using 
probabihty plotting. In the categories Services, Fuel and Travel, the values 
obtained by formulae (3) and (4) follow a straight line in lognormal proba- 
bility plots within sampling error indicating lognormality of the distributions 
(see bottom of Figures 1, 2, 3 for representative plots). In a few plots there 
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are outliers present in the upper and lower ends of the distribution which 
appear to result from contamination. Based on where Monte-Carlo gen- 
erated samples from mixtures of two lognormal distributions are displayed in 
probability plotting, we concluded that the weight of a possible contaminant 
distribution is less than 0.2. For the category Food, it is difficult to dis- 
tinguish between normality and lognormality from probability plotting. In 
normal probability plotting, we obtain in most subsamples a slightly concave 
curve indicating that the distribution is skewed to the right p[ while obtain- 
ing a straight line in the remaining cases indicating normality. In the former 
instance, the sample points follow a slightly convex curve in lognormal prob- 
ability plotting indicating a deviation from lognormality towards normality 
i- 

IV. Discussion 

We see two potential explanations for the regularities found in this paper. 
First, the distributions might originate simply from fluctuations inherent in 
the process by which the data is obtained. The reported individual expendi- 
tures within a given category involve adding amounts from many instances of 
trading for goods and services of many types and brands. However, the fact 
that in at least three good categories we find lognormal distributions makes it 
unlikely that the regularities are attributable solely to random fluctuations. 
By the Central Limit Theorem a lognormal distribution of an observable 
would originate from a multiplicative process involving stochastically inde- 
pendent fluctuation on each stage, but we do not see how a multiplicative 
process could be involved in the process by which our data is obtained. There- 
fore we suggest that the observed regularities have a second - and deeper - 
origin in a stochastic process governing the heterogenity of individual tastes 
0. Lognormal distributions are ubiquitous in natural sciences where their 
origin is some structure of the underlying system. The question of whether 
the regularities in consumption data have some deeper origin lying in the 
structure of socioeconomic systems is presumably worth exploring. 

The authors thank J. Arns for extracting the data from the Family Expen- 
diture Survey. We are indebted to W. Hildenbrand and D. Stauffer for many 
insightful discussions. S.P. gratefully acknowledges financial support from 
the Graduiertenforderung, University of Bonn. 

^In economic theory, heterogenity of tastes would be formalised as a distribution on 
an infinite dimensional space of functions representing preferences, as outlined in the 
introduction. However, the notion of preferences is a hypothetical concept and one might 
well doubt their existence if this concept does not provide testable predictions. 
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Figure 1: Representative plots for the category Services; top: nonpara- 
metric density estimates for the subsamples: 1987, 1 person, income 40-70 
(diamonds) and 1986, 2 persons, income 100-150 bottom: lognormal 
probabihty plots for the same subsamples 
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Figure 2: Representative plots for the category Fuel; top: nonparamet- 
ric density estimates for the subsamples: 1988, 1 person, income 70-100 
(diamonds) and 1992, 2 persons, income 200-250 bottom: lognormal 
probability plots for the same subsamples 
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Figure 3: Representative plots for the category Travel; top: nonparamet- 
ric density estimates for the subsamples: 1988, 2 persons, income 150-200 
(diamonds) and 1992, 2 persons, income 200-250 bottom: lognormal 
probability plots for the same subsamples 
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